Projects / JWPL

JWPL

JWPL is a language independent, database-driven, high performance Wikipedia API that provides structured access to information nuggets like redirects, categories, articles, and link structure. It contains a Mediawiki Markup parser that can be used to further analyze the contents of a Wikipedia page or standalone with other text, TimeMachine, which reconstructs a snapshot of Wikipedia from a specific date, or multiple snapshots from a time span, and RevisionMachine, which offers efficient access to the history of articles using a dedicated storage format which decreases storage space by 98%. This enables random access to the whole revision history without requiring several terabytes of storage for a single Wikipedia dump.

Tags
Licenses
Operating Systems
Implementation

RSS Recent releases

  •  21 Feb 2012 01:11

Release Notes: This release fixes a bug in the API which prevented fetching inlink IDs. Several improvements to hibernate session handling have been made.

  •  09 Feb 2012 22:38

Release Notes: JWPL Core now depends on Hibernate 4.0.0-final. The PageIterator can now iterate over a predefined list of pages. All components of the RevisionMachine are now able to produce datafiles in addition to SQL dumps. A severe error in the DiffTool has been fixed that caused exceptions when creating a new revision dump.

Screenshot

Project Spotlight

Text translation for Asterisk using Google Translate

Text translation and language detection for Asterisk using the Google Translate API.

Screenshot

Project Spotlight

Kwatee Agile Deployment

Lightweight and powerful automated software deployment.